Uncovering Heterogeneity Using Tree-Based Machine Learning

paper presentation

3/20/23

Tree-based ML

Code
data(iris)

iris %>% 
  select(Sepal.Length, Sepal.Width, Species) %>% 
  head() %>% 
  kable()
Sepal.Length Sepal.Width Species
5.1 3.5 setosa
4.9 3.0 setosa
4.7 3.2 setosa
4.6 3.1 setosa
5.0 3.6 setosa
5.4 3.9 setosa

Tree-based ML

Tree-based ML

  • The maximum value of Gini(Ω) is 1/2, corresponding to a graph where p = 0.5, indicating that the sample data is evenly split between classes.

Hetergentous Treatment Effects

  • Assumptions:
    • Unconfoundedness: whether or not a person is treated is determined completely by their observable characteristics.
    • Overlap: at every point of the covariate space we can always find treated and control individuals.

Causal Tree

  • Adaption:
    • to partition the data to minimize differences in potential outcomes rather than minimizing differences in observed outcomes.
    • however, we do not observe the causal effect for any unit.
  • Honest Estimation:
    • performance of the tree is based on treatment effect heterogeneity rather than predictive outcome accuracy.

Causal Tree

Apply Causal Tree with simulation:

Code
tree <- simulation %>% 
  causalTree(
    y ~ x1 + x2 + x3 +x4,
    data = .,
    treatment = simulation$treatment,
    split.Rule = "CT",
    cv.option = "CT", 
    split.Honest = T, 
    cv.Honest = T, 
    split.Bucket = F, 
    xval = 5, 
    cp = 0, 
    minsize = 20, 
    propensity = 0.5
  )

Tree plot:

Code
rpart.plot(tree)

Empirical Application

  • Question: heterogeneous effects of college on reducing low-wage work for subgroups defined by different variables.

  • shiny app